fix(schema): stepName truncates at byte boundary for multi-byte UTF-8 steps#104
fix(schema): stepName truncates at byte boundary for multi-byte UTF-8 steps#104greynewell merged 1 commit intomainfrom
Conversation
… steps stepName() used len(step)/step[:77] (byte operations) to enforce the 80-char limit on HowToStep names in JSON-LD. For instruction steps containing multi-byte UTF-8 characters (é, ñ, CJK, emoji), step[:77] could land inside a character's byte sequence, producing invalid UTF-8 that json.Marshal silently replaces with \uFFFD (U+FFFD). The sentence-length guard (idx < 80) also compared a byte offset to 80, incorrectly rejecting short multi-byte sentences that were < 80 runes but > 80 bytes. Fixes: - Compute rune slice once: runes := []rune(step) - Sentence guard: compare len([]rune(step[:idx+1])) < 80 (rune count) - Truncation: string(runes[:77]) + "..." (rune slice, valid UTF-8) Adds TestStepName covering ASCII truncation, sentence extraction, multi-byte truncation (valid UTF-8 + correct rune count), and multi-byte sentence guard. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 0 minutes and 7 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Summary
stepName()extracted theHowToStep.namefield for JSON-LD recipe structured data using byte operations:len(step) > 80— checked byte count, not rune countstep[:77]— byte slice that can land inside a multi-byte characteridx < 80— compared a byte offset to a character limitFor instruction steps with multi-byte UTF-8 characters (é, ñ, CJK, emoji),
step[:77]could split a character's byte sequence, producing invalid UTF-8 thatjson.Marshalsilently replaces with\uFFFD.Fix: compute
runes := []rune(step), uselen([]rune(step[:idx+1]))for the sentence guard, andstring(runes[:77])for truncation.Test plan
TestStepNamecovers ASCII truncation (77+...=80 runes), sentence extraction, multi-byte step truncation (valid UTF-8, correct rune count), and multi-byte sentence guardTestParseDurationMinutesandTestComputeTotalTimestill passgo build ./...passes🤖 Generated with Claude Code